{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced Custom Primitives Guide"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from featuretools.primitives import TransformPrimitive\n",
    "from featuretools.tests.testing_utils import make_ecommerce_entityset\n",
    "from woodwork.column_schema import ColumnSchema\n",
    "from woodwork.logical_types import Datetime, NaturalLanguage\n",
    "import featuretools as ft\n",
    "import numpy as np\n",
    "import re"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Primitives with Additional Arguments\n",
    "\n",
    "Some features require more advanced calculations than others. Advanced features usually entail additional arguments to help output the desired value. With custom primitives, you can use primitive arguments to help you create advanced features.\n",
    "\n",
    "### String Count Example\n",
    "\n",
    "In this example, you will learn how to make custom primitives that take in additional arguments. You will create a primitive to count the number of times a specific string value occurs inside a text.\n",
    "\n",
    "First, derive a new transform primitive class using `TransformPrimitive` as a base. The primitive will take in a text column as the input and return a numeric column as the output, so set the input type to a Woodwork `ColumnSchema` with logical type `NaturalLanguage` and the return type to a Woodwork `ColumnSchema` with the semantic tag `'numeric'`. The specific string value is the additional argument, so define it as a *keyword* argument inside `__init__`. Then, override `get_function` to return a primitive function that will calculate the feature.\n",
    "\n",
    "Featuretools' primitives use Woodwork's `ColumnSchema` to control the input and return types of columns for the primitive. For more information about using the Woodwork typing system in Featuretools, see the [Woodwork Typing in Featuretools](../getting_started/woodwork_types.ipynb) guide."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class StringCount(TransformPrimitive):\n",
    "    '''Count the number of times the string value occurs.'''\n",
    "    name = 'string_count'\n",
    "    input_types = [ColumnSchema(logical_type=NaturalLanguage)]\n",
    "    return_type = ColumnSchema(semantic_tags={'numeric'})\n",
    "\n",
    "    def __init__(self, string=None):\n",
    "        self.string = string\n",
    "\n",
    "    def get_function(self):\n",
    "        def string_count(column):\n",
    "            assert self.string is not None, \"string to count needs to be defined\"\n",
    "            # this is a naive implementation used for clarity\n",
    "            counts = [text.lower().count(self.string) for text in column]\n",
    "            return counts\n",
    "\n",
    "        return string_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you have a primitive that is reusable for different string values. For example, you can create features based on the number of times the word \"the\" appears in a text. Create an instance of the primitive where the string value is \"the\" and pass the primitive into DFS to generate the features. The feature name will automatically reflect the string value of the primitive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "es = make_ecommerce_entityset()\n",
    "\n",
    "feature_matrix, features = ft.dfs(\n",
    "    entityset=es,\n",
    "    target_dataframe_name=\"sessions\",\n",
    "    agg_primitives=[\"sum\", \"mean\", \"std\"],\n",
    "    trans_primitives=[StringCount(string=\"the\")],\n",
    ")\n",
    "\n",
    "feature_matrix[[\n",
    "    'STD(log.STRING_COUNT(comments, string=the))',\n",
    "    'SUM(log.STRING_COUNT(comments, string=the))',\n",
    "    'MEAN(log.STRING_COUNT(comments, string=the))',\n",
    "]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Features with Multiple Outputs\n",
    "\n",
    "Some calculations output more than a single value. With custom primitives, you can make the most of these calculations by creating a feature for each output value.\n",
    "\n",
    "### Case Count Example\n",
    "\n",
    "In this example, you will learn how to make custom primitives that output multiple features. You will create a primitive that outputs the count of upper case and lower case letters of a text.\n",
    "\n",
    "First, derive a new transform primitive class using `TransformPrimitive` as a base. The primitive will take in a text column as the input and return two numeric columns as the output, so set the input type to a Woodwork `ColumnSchema` with logical type `NaturalLanguage` and the return type to a Woodwork `ColumnSchema` with semantic tag `'numeric'`. Since this primitive returns two columns, also set `number_output_features` to two. Then, override `get_function` to return a primitive function that will calculate the feature and return a list of columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class CaseCount(TransformPrimitive):\n",
    "    '''Return the count of upper case and lower case letters of a text.'''\n",
    "    name = 'case_count'\n",
    "    input_types = [ColumnSchema(logical_type=NaturalLanguage)]\n",
    "    return_type = ColumnSchema(semantic_tags={'numeric'})\n",
    "    number_output_features = 2\n",
    "\n",
    "    def get_function(self):\n",
    "        def case_count(array):\n",
    "            # this is a naive implementation used for clarity\n",
    "            upper = np.array([len(re.findall('[A-Z]', i)) for i in array])\n",
    "            lower = np.array([len(re.findall('[a-z]', i)) for i in array])\n",
    "            return upper, lower\n",
    "\n",
    "        return case_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you have a primitive that outputs two columns. One column contains the count for the upper case letters. The other column contains the count for the lower case letters. Pass the primitive into DFS to generate features. By default, the feature name will reflect the index of the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feature_matrix, features = ft.dfs(\n",
    "    entityset=es,\n",
    "    target_dataframe_name=\"sessions\",\n",
    "    agg_primitives=[],\n",
    "    trans_primitives=[CaseCount],\n",
    ")\n",
    "\n",
    "feature_matrix[[\n",
    "    'customers.CASE_COUNT(favorite_quote)[0]',\n",
    "    'customers.CASE_COUNT(favorite_quote)[1]',\n",
    "]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Custom Naming for Multiple Outputs\n",
    "\n",
    "When you create a primitive that outputs multiple features, you can also define custom naming for each of those features.\n",
    "\n",
    "### Hourly Sine and Cosine Example\n",
    "\n",
    "In this example, you will learn how to apply custom naming for multiple outputs. You will create a primitive that outputs the sine and cosine of the hour.\n",
    "\n",
    "First, derive a new transform primitive class using `TransformPrimitive` as a base. The primitive will take in the time index as the input and return two numeric columns as the output. Set the input type to a Woodwork `ColumnSchema` with a logical type of `Datetime` and the semantic tag `'time_index'`. Next, set the return type to a Woodwork `ColumnSchema` with semantic tag `'numeric'` and set `number_output_features` to two. Then, override `get_function` to return a primitive function that will calculate the feature and return a list of columns. Also, override `generate_names` to return a list of the feature names that you define."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class HourlySineAndCosine(TransformPrimitive):\n",
    "    '''Returns the sine and cosine of the hour.'''\n",
    "    name = 'hourly_sine_and_cosine'\n",
    "    input_types = [ColumnSchema(logical_type=Datetime, semantic_tags={'time_index'})]\n",
    "    return_type = ColumnSchema(semantic_tags={'numeric'})\n",
    "\n",
    "    number_output_features = 2\n",
    "\n",
    "    def get_function(self):\n",
    "        def hourly_sine_and_cosine(column):\n",
    "            sine = np.sin(column.dt.hour)\n",
    "            cosine = np.cos(column.dt.hour)\n",
    "            return sine, cosine\n",
    "\n",
    "        return hourly_sine_and_cosine\n",
    "\n",
    "    def generate_names(self, base_feature_names):\n",
    "        name = self.generate_name(base_feature_names)\n",
    "        return f'{name}[sine]', f'{name}[cosine]'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now you have a primitive that outputs two columns. One column contains the sine of the hour. The other column contains the cosine of the hour. Pass the primitive into DFS to generate features. The feature name will reflect the custom naming you defined."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feature_matrix, features = ft.dfs(\n",
    "    entityset=es,\n",
    "    target_dataframe_name=\"log\",\n",
    "    agg_primitives=[],\n",
    "    trans_primitives=[HourlySineAndCosine],\n",
    ")\n",
    "\n",
    "feature_matrix.head()[[\n",
    "    'HOURLY_SINE_AND_COSINE(datetime)[sine]',\n",
    "    'HOURLY_SINE_AND_COSINE(datetime)[cosine]',\n",
    "]]"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
